A dynamic attribute reduction algorithm based on relative neighborhood discernibility degree

This paper addresses the current existence of attribute reduction algorithms for incomplete hybrid decision-making systems, including low attribute reduction efficiency, low classification accuracy and lack of consideration of unlabeled data types. To address these issues, this paper first redefines the weakly labeled relative neighborhood discernibility degree and develops a non-dynamic attribute reduction algorithm. In addition, this paper proposes an incremental update mechanism for weakly tagged relative neighborhood discernibility degree and introduces a new dynamic attribute reduction algorithm for increasing the set of objects based on it. Meanwhile, this paper also compares and analyses the improved algorithm proposed in this study with two existing attribute reduction algorithms using 8 data sets in the UCI database. The results show that the dynamic attribute reduction algorithm proposed in this paper achieves higher attribute reduction efficiency and classification accuracy, which further validates the effectiveness of the algorithm proposed in this paper.


Basic knowledge
In this section, we will cover the fundamental concepts of weakly labeled incomplete mixed decision systems and rough sets.For further details, please refer to reference [35][36][37] .Definition 1 30 WDIS = U, C ∪ D, V , f is used to represent a weakly labeled incomplete mixed decision system; where U = {x 1 , x 2 , • • • , x n } represents a non-empty finite set of objects, also called the universe;C represents a non- empty finite set of attributes,C = C d ∪ C r ,C d represents a discrete set of attributes,C r represents a continuous set of attributes,D is decision attribute,C ∪ D = ∅,V = ∪ a∈C V a ,V a is a set of all possible values of attributes a ∈ C; f denotes the mapping function of U × C → V,which assigns a value to each object's attributes, means ∀a ∈ C ,x ∈ U, f (x, a) ∈ V a ,and at least one attribute a ∈ C,with f (a, x) = * ,that is the attribute has missing values.Among them, there are also missing values for the decision attribute d,that is,∃x i ∈ U,with d(x i ) = * ,which represents the set of unlabeled objects.Therefore U = M ∪ L,where M represents the set of unlabeled objects and L represents the set of labeled objects.
Definition 2 31 Given a weakly labeled incomplete mixed decision system WDIS = U, C ∪ D, V , f ,where B ⊆ C and B = B d ∪ B r ,B d represents a discrete attribute set,B r represents a continuous attribute set,∀a ∈ B,the neigh- borhood tolerance relationship of the attribute set B and the neighborhood tolerance class of x i with respect to attribute set B are defined as: where δ is the neighborhood radius, which is a non-negative constant.
(1) The theorem 1 mentioned above shows that as the attribute set increases, the relative neighborhood discernibility degree of weak labeling does not increase monotonically.This provides a theoretical basis for attribute reduction in incomplete decision systems with weak labeling.Therefore, based on the monotonicity of relative neighborhood discernibility degree of weak labeling, a non-dynamic attribute reduction algorithm can be constructed.

Definition 6
Given a weakly labeled incomplete mixed decision system WDIS = U, C ∪ D, V , f ,∀a ∈ B ⊆ C ,the significance of the internal attributes of a is: Definition 7 Given a weakly labeled incomplete mixed decision system WDIS = U, C ∪ D, V , f ,∀a ∈ C − B ,the significance of the external attributes of a is: Definition 8 Given a weakly labeled incomplete mixed decision system WDIS = U, C ∪ D, V , f ,if R ⊆ C is an attribute reduction set, then R satisfies: The literature 22 presents a dynamic attribute reduction algorithm based on discriminant matrix for increasing the object set; whereas, the literature 24 presents a dynamic attribute reduction algorithm for increasing the object set based on information entropy.However, these algorithms are only designed for single, labeled data type attribute reduction, and are not applicable for mixed or unlabeled data.To address this issue, this paper proposes a non-dynamic attribute reduction algorithm for weakly labeled incomplete mixed decision systems based on weak label relative neighborhood discernibility degree, as shown in Algorithm 1. Algorithm 1.A Non dynamic attribute reduction algorithm for weakly labeled incomplete mixed decision systems.
Time complexity of non-dynamic Algorithm 1: Step 1 calculates the relative neighborhood discernibility degree of the entire set of attributes, whose time complexity is O(|U||C|) .Step 2 selects the most important attributes in each loop to be added to the set of candidate attributes until the termination condition is satisfied, whose time complexity is O(|U| 2 |C| 3 ) , and Step 5 removes redundant attributes from the set of candidate attributes, whose time complexity is O(|U 2 ||C| 2 ) .Therefore, the time complexity of Algorithm 1 is O(|U| 2 |C|) 3 .

Example analysis
To further elaborate on the weakly labeled incomplete mixed decision system's non-dynamic attribute reduction algorithm 1 proposed in this paper, the feasibility of the non-dynamic attribute reduction algorithm 1 is verified using the data in Table 2 as an example, with a neighborhood radius of δ = 0.2.
Table 2 represents a weakly labeled incomplete mixed decision system, where U = {x 1 , x 2 , ..., x 7 } ,L = {x 1 , x 2 , x 3 , x 4 , x 5 } is a set of labeled objects and M = {x 6 , x 7 } is a set of unlabeled objects; 2 it follows that the neighborhood tolerance classes of x i with respect to C is: According to step 4:NDD R (U) = NDD C (U),go to step 2; Continuing to calculate:∀c ∈ C − R,the calculated significance of external attributes is: The attribute with the highest significance of external attributes is c 4 ,then R = R ∪ {c 4 },where NDD R (U) = 2 ,satisfies NDD R (U) = NDD C (U),go to step 5.According to step 5:∀c ∈ R,calculated the significance of internal attributes of c:sig inner (c 4 , R) = 0,sig inner (c 5 , R) = 0,then keep R unchanged and output the attribute reduction set R = {c 4 , c 5 } of the system WDIS.

Dynamic attribute reduction algorithm
This section outlines an incremental update mechanism for improving the efficiency of attribute reduction in a weakly labeled incomplete mixed decision system when increasing the object set.It introduces an update mechanism based on the weakly labeled relative neighborhood discernibility degree, which calculates the weakly labeled relative neighborhood discernibility degree after increasing the object set.Utilizing the original weakly labeled relative neighborhood discernibility degree, this mechanism then derives the weakly labeled relative neighborhood discernibility degree of the data after increasing the object set.As a result, the attribute reduction of the system's properties after increasing objects was obtained.Ultimately, this section proposes a new dynamic attribute reduction algorithm for increasing the object set.

Theorem 2 Given a weakly labeled incomplete mixed decision system
the neighborhood radius is δ .Increased an object set in the system Y,and the new weakly labeled incomplete mixed decision system is denoted as ,where m is the newly increased unlabeled object set and l is the newly increased labeled object set.After increas- ing the object set, the incremental update formula for the weakly labeled relative neighborhood discernibility degree of B under U ∪ Y after increasing the object set Y is: Proof Let the set of increased objects be denoted as Y = y 1 , y 2 , • • • , y p ,Y B (x) denotes that under set B ,the new object y belongs to the neighborhood relation of It can be concluded that: When the weak labeling incomplete mixed decision system increases the object set, the non-dynamic attribute reduction algorithm repeatedly treats the data set with the addition of new object sets as a new data set and then re-conducts attribute reduction on the data set, leading to significant time and space consumption.To address this issue, this paper proposes an incremental updating mechanism for the relative neighborhood discernibility degree of the increased object set based on Theorem 2 and introduces a dynamic attribute reduction algorithm for the added object set as shown in algorithm 2. Step 1 calculates the relative neighborhood differentiation after updating, and its time complexity is O(|U ∪ �Y ||R′|) , Step 4 gradually adds the important attrib- ute set to the marquee attribute subset until the termination condition is satisfied, and its time complexity is O(|U ∪ �Y ||R′||C − R′|) , Step 6 removes the redundant attributes in the set of candidate attributes, and its time complexity is O(|U ∪ �Y ||R′| 2 ) .Therefore, the time complexity of the dynamic Algorithm 2 is O(|U ∪ �Y ||R′||C|) , and compared to the non-dynamic Algorithm 1 which has the time complexity Algorithm 1 with the time complexity O(|U ∪ �Y | 2 |C|) 3 , the time complexity of the Algorithm 2 has been effectively reduced.

Example analysis
To further illustrate the dynamic attribute reduction algorithm 2 for the incomplete weakly labeled mixed decision system proposed in this paper, the feasibility of the algorithm is validated using the original data from Table 2 and the newly increased object set as shown in Table 3. From the aforementioned analysis, it can be concluded that the attribute reduction set R = {c 4 , c 5 } and NDD R (U) = NDD C (U) = 2 for system WDIS.
Table 3. Object Set Increased.

Experimental description
The algorithms proposed in this paper were validated using 8 data sets from the UCI 32 database.Details of the specific datasets can be found in Table 4.The experiments were conducted on a computer with an Intel(R) Core (TM) i5-9500 CPU (3.00 GHz) and 8.0 GB of memory, running Windows 10 and using the Matlab2022a software platform.
In Table 4 eight datasets were subjected to min-max normalization to eliminate scale differences.In addition, to accommodate weakly labelled incomplete data types, 20% random missing values are applied to the attribute values and decision values in the dataset, so that there are 20% missing values in the attribute values and 20% missing values in the decision values.Subsequently, the datasets were subjected to attribute reduction using the Semi-D algorithm 33 , Rsfs algorithm 34 , non-dynamic algorithm 1, and dynamic algorithm 2. Since the Semi-D algorithm can only handle discrete data, a discretization process was applied to the continuous data in the datasets.To evaluate the effectiveness of these four algorithms in attribute reduction, this study utilizes the classification accuracy under the RF classifier as the evaluation metric.

Comparison of performance of different algorithms when increasing object set
In order to verify the effectiveness of the dynamic attribute reduction algorithm 2 in augmenting the object set, the dataset was first randomly sorted to ensure that there was not only one categorical category in the extracted base data, since the dataset was initially sorted by decision value.The dataset was divided into two parts with the first 50% as the base data and the second 50% as the incremental dataset.The incremental dataset was increased in increments of 10% by adding 10 percent of the incremental dataset for the first time, 20% of the incremental dataset for the second time, and 30% of the incremental dataset for the third time, and so on for 10 iterations.
The Fig. 1 compares the time taken by the Semi-D algorithm, Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2 for increasing object set attribute reduction on 8 data sets.The horizontal axis represents the number of times the object set is increased, and the vertical axis represents the time for attribute reduction, measured in seconds(s).
By observing Fig. 1, it becomes evident that the attribute reduction time of non-dynamic attribute reduction algorithm 1, dynamic attribute reduction algorithm 2, and the comparison algorithm gradually increases as the object set gradually increases in the weakly labeled incomplete mixed decision system.Notably, the attribute reduction time of dynamic attribute reduction algorithm 2 is significantly lower than that of the other three algorithms.For example, in the Dermatology dataset, when the object set was added for the tenth time, the reduction times for the Semi-D algorithm, Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2 were 85.7269 s, 74.5495 s, 59.0794 s, and 30.6935 s, respectively.In comparison, the reduction times for dynamic attribute reduction algorithm 2 decreased by 64.19%, 58.82%, and 48.05%.Similarly, in the Lymphography dataset, when the object set was added for the fourth time, the reduction times for the Semi-D algorithm, Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2 were 9.1935s, 7.3605 s, 2.5512 s, and 0.9562 s, respectively.The reduction times for dynamic attribute reduction algorithm 2 decreased by 89.59%, 87.01%, and 62.52%, respectively.
Based on the small data described above, the dynamic attribute reduction algorithm in this paper is also applicable to relatively large data.such as in the data Letter, when the object set was added for the 10th time, the reduction times for the Semi-D algorithm, Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2 were 118,967.0s, 98,746.0s, 78,934.0s, 36,786.0s, respectively.The reduction times for dynamic attribute reduction algorithm 2 decreased by 69.08%, 62.75%, and 53.40%, respectively.Therefore, it can be clearly concluded that the attribute reduction time of the dynamic algorithm 2 in this paper is significantly reduced for both large and small datasets.
The results show that after adding the object set, Dynamic Attribute Reduction Algorithm 2 not only reduces reduction time compared to Semi-D Algorithm and Rsfs Algorithm, but also reduces reduction time significantly compared to Non-Dynamic Attribute Reduction Algorithm 1.This is because Non-Dynamic Attribute Reduction Algorithm 1, Semi-D Algorithm, and Rsfs Algorithm require recalculating the neighborhood class and weak label relative neighborhood discernibility degree of the data after increasing the object set, while Dynamic Attribute Reduction Algorithm 2 uses an incremental update mechanism to calculate the relative neighborhood  When the object set is increased for the 7th time, the count stands at 17, 15, 13, and 13, while for the 10th time, it is 17, 16, 13, and 13.These results support the conclusion that dynamic attribute reduction algorithm 2 proves more effective compared to the other algorithms.Table 5 presents the classification accuracies of the Semi-D algorithm, the Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2 under the RF classifier for eight datasets when the object set is incremented for the 10th time.The results in Table 5 indicate that, for the 10th increase in the object set, the Dermatology and Ecoli datasets achieve the highest classification accuracy with the non-dynamic attribute reduction algorithm 1. Conversely, the German, Breast Tissue, Lymphography, Ecoli, Ionosphere, Student Performance, and HCV datasets exhibit the highest classification accuracy with the dynamic attribute reduction algorithm 2. Notably, the average classification accuracy across all datasets significantly favors dynamic algorithm 2, highlighting its superior performance.Consequently, the findings substantiate the assertion that the proposed dynamic algorithm 2 achieves higher classification accuracy Therefore, this provides a new attribute reduction algorithm with higher classification accuracy for the dynamic attribute reduction of weakly labeled incomplete mixed decision systems that increase the object set.

Conclusion
For the dynamic attribute reduction of weakly labeled incomplete hybrid decision systems, this paper proposes a non-dynamic attribute reduction algorithm 1 based on weakly labeled relative neighborhood discernibility degree by improving the definition of weakly labeled relative neighborhood discernibility degree by using all the unlabeled and labeled datasets, and proposes a dynamic attribute reduction algorithm 2 by constructing an incremental updating mechanism of weakly labeled relative neighborhood discernibility degree in the presence of an increased set of objects Algorithm 2. Finally, it is experimentally verified that the dynamic attribute reduction algorithm 2 proposed in this paper can significantly improve the reduction efficiency, which can obtain faster reduction time as well as higher classification accuracy, thus verifying the effectiveness of the dynamic algorithm 2 proposed in this paper, and further providing a simpler and more accurate attribute reduction algorithm for dynamic attribute reduction algorithms of weakly labeled incomplete hybrid decision-making systems.However, the data in life is not only the object set changing dynamically, but also the object set and the attribute set changing at the same time, so on the basis of the dynamic attribute reduction algorithm for increasing the object set in this paper, the next step will be to study the attribute reduction when the attribute set and the object set change at the same time in the weakly labeled incomplete hybrid decision system.

Figure 1 .
Figure 1.Compares the time taken for attribute reduction by four algorithms as the number of Object sets increases.

Figure 2 .
Figure 2. Number of attribute reduction for the four attribute reduction algorithms when increasing the set of objects.

1 c 2 c 3 c 4 c 5 D
. Weakly labeled incomplete mixed decision systems.U c

Table 2 .
Weakly labeled incomplete mixed decision systems.

Table 4 .
Data set table.discernibility degree based on the attribute reduction result of the original data set, thus reducing a significant amount of reduction time and verifying the effectiveness of Dynamic Attribute Reduction Algorithm 2 in this article.Figure2presents the attribute reduction numbers for four different algorithms (Semi-D algorithm, Rsfs algorithm, non-dynamic attribute reduction algorithm 1, and dynamic attribute reduction algorithm 2) as the object sets increase for the 1st, 4th, 7th, and 10th time across 8 data sets.It is evident from the figure that the

Table 5 .
Classification accuracy of four algorithms on the RF classifier when increasing object sets.